منابع مشابه
Research Project Report: Spark, BlinkDB and Sampling
During the 2015-2016 academic year, I conducted research about Spark, BlinkDB and various sampling techniques. This research helped the team have a better understanding of the capabilities and properties of Spark, BlinkDB system and different sampling technologies. Additionally, I benchmarked and implemented various Machine learning and sampling methods on the top of both Spark and BlinkDB. The...
متن کاملScaling Spark on Lustre
We report our experiences in porting and tuning the Apache Spark data analytics framework on the Cray XC30 (Edison) and XC40 (Cori) systems, installed at NERSC. We find that design decisions made in the development of Spark are based on the assumption that Spark is constrained primarily by network latency, and that disk I/O is comparatively cheap. These assumptions are not valid on Edison or Co...
متن کاملVerifying Equivalence of Spark Programs Technical Report 1-Nov-2016
Spark is a popular framework for writing large scale data processing applications. Our goal is to develop tools for reasoning about Spark programs. This is challenging because Spark programs combine database-like relational algebraic operations and aggregate operations with User Defined Functions (UDF s). We present the first technique for verifying the equivalence of Spark programs. We model S...
متن کاملMetal-oxide surge arrester model for fast transient simulations
Metal-oxide surge arresters have dynamic characteristics that are significant for overvoltage coordination studies involving fast front surges. Several models with acceptable accuracy have been proposed to simulate this frequency-dependent behavior. Difficulties arise in the calculation and adjustment of the model parameters: in some cases iterative procedures are required, in others the necess...
متن کاملS2RDF: RDF Querying with SPARQL on Spark
RDF has become very popular for semantic data publishing due to its flexible and universal graph-like data model. Thus, the ever-increasing size of RDF data collections raises the need for scalable distributed approaches. We endorse the usage of existing infrastructures for Big Data processing like Hadoop for this purpose. Yet, SPARQL query performance is a major challenge as Hadoop is not inte...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of the Franklin Institute
سال: 1842
ISSN: 0016-0032
DOI: 10.1016/s0016-0032(42)91280-8